cmake: add find_package(lux-gpu-kernels) discovery hook#1
Merged
Conversation
The proprietary high-performance GPU kernels (Metal/CUDA/WGSL) for
crypto schemes, AIVM, and FHE have been extracted into the private
repository lux-private/gpu-kernels (commercial license required).
This commit adds a non-breaking discovery hook to the top-level
CMakeLists.txt:
- find_package(lux-gpu-kernels CONFIG QUIET) probes for the private
install via CMAKE_PREFIX_PATH.
- When found, the in-tree gpu/<backend> subdirs are symlinked from
the install root, allowing per-scheme CMakeLists to compile the
real GPU drivers from the canonical private source.
- When not found, CRYPTO_ENABLE_{CUDA,METAL,WGSL} are forced OFF
and the build proceeds with the in-tree CPU implementations
unchanged. The cevm-genesis-parity test passes byte-equal in
this mode (state root + genesis hash match).
In-tree gpu/ subdirs remain in this commit to keep historical
consumers building unchanged; a follow-up will gate per-scheme
add_library(... gpu/...) calls behind CRYPTO_ENABLE_* and delete
the in-tree kernels.
Commercial license inquiries: licensing@lux.network
The find_package(lux-gpu-kernels) hook (6b632c7) symlinks each scheme's gpu/<backend>/ from lux-private at configure time and force-disables CRYPTO_ENABLE_{CUDA,METAL,WGSL} when the private repo is absent. That made the gates exist as cache vars but every per-scheme CMakeLists still unconditionally referenced gpu/<backend>/<driver>.{cpp,cu,wgsl} via add_library / file(READ) — so deleting the in-tree gpu/ subtrees would break configure + build even with the option flipped off. This commit wraps every gpu-touching add_library + file(READ) + set(...) block in its scheme's per-backend CRYPTO_ENABLE_* guard: aead, blake2b, bls (stages 1-4 + combined-miller + cuda-stub + wgsl-driver), cggmp21, frost, gpukit (cuda driver + multi_pippenger cuda source), lamport, math (lattice_ring_cuda), ntt (large-N cuda/metal/wgsl), ripemd160, secp256k1 (batch_inv + ecrecover cuda + wgsl), sha256, plus fhe (host CUDA shims + stub for lattice_ring_cuda_* C ABI when lux-private absent). Top-level CMakeLists tests gated parallel to the lib gates: sha256_cuda_test, sha256_wgpu_test, frost_presign_test, cggmp21_presign_test, ripemd160_cuda_test, ripemd160_wgpu_test, bn254_gpu_determinism_test + bn254_pairing_consts codegen, modexp_karatsuba_gpu_test, kzg_gpu_determinism_test, ringtail_lattice_ring_bench, ringtail_lattice_ring_sweep_bench (needed METAL+WGSL because source statically references both), pedersen_tree_{metal,cuda,wgpu}_determinism_test, batch_inv_{cuda,wgsl}_test, banderwagon_cuda_determinism_test, banderwagon_wgsl_determinism_test, ntt_large_test, gpukit tests (all-backends-required because the source statically resolves gpukit_*_{metal,cuda,wgsl}). Drive-by fixes to the umbrella crypto target loop: - skip "math" (it's a substrate with math_codec/math_modarith/... — no umbrella math target ever existed, link line was emitting -lmath) - if(TARGET ${_alg}) so the loop tolerates lazy/absent targets New file fhe/cpp/backends/cuda/lattice_ring_cuda_stub.cpp provides CPU-only stub bodies for the six extern C lattice_ring_cuda_* symbols the FHE host dispatcher references unconditionally. Available() returns 0 so the dispatcher routes to the CPU oracle; every NTT/MUL entry point returns -1 NOTIMPL so any accidental call surfaces immediately. CPU-only configure + build: PASS (crypto static + all linkable tests).
Removes 303 files across 29 schemes' gpu/{cuda,metal,wgsl}/ subtrees and
math/ntt/cuda/. The kernels now live in lux-private/gpu-kernels and are
symlinked back into <scheme>/gpu/<backend>/ by the find_package hook in
crypto/CMakeLists.txt (PR #1) when lux-gpu-kernels is found at configure.
math/ntt/cuda/lattice_ring_driver.h kept as a minimal public C ABI
declaration so fhe/cpp/backends/cuda/cuda_ntt_kernel.cpp can resolve
the extern "C" lattice_ring_cuda_* signatures it dispatches against —
the bodies come from either the real lattice_ring_cuda target (CUDA on)
or fhe/cpp/backends/cuda/lattice_ring_cuda_stub.cpp (CUDA off).
pedersen/CMakeLists.txt: wrap WGSL block in CRYPTO_ENABLE_WGSL and drop
the CUDA else() stub branch (cleaned up alongside the rest, missed in
the previous gating commit).
CPU-only verification (no CMAKE_PREFIX_PATH, lux-private absent):
- configure: clean
- build: 446/446 targets
- ctest: 70/70 passing (35.25s wall, includes pulsar/fhe/bls/slhdsa KATs)
Two consumers reach for math/ntt/cuda/lattice_ring_driver.h with relative
paths that broke after the prior commit deleted the in-tree gpu/ subtrees:
* math/ntt/c-abi/c_math_ntt.cpp — Go cgo bridge for the math/ntt
backends. Uses __has_include + LUX_MATH_NTT_HAVE_{CUDA,METAL,WGSL}
guards so each backend's surface compiles to its body only when the
driver header is present; absent backends fall through to a -2
(not-built) stub. Always-built target.
* fhe/cpp/backends/cuda/cuda_ntt_kernel.cpp — moved the lattice_ring
CUDA C ABI declarations next to the stub bodies
(fhe/cpp/backends/cuda/lattice_ring_cuda_decls.h, renamed from the
short-lived math/ntt/cuda/lattice_ring_driver.h stub) so the FHE
dispatcher can build without depending on math/ntt/cuda existing
as a real directory. The latter is a symlink into lux-private when
CRYPTO_ENABLE_CUDA=ON; absent otherwise — `NOT EXISTS` is the
discovery hook's gate so a real header in math/ntt/cuda blocks the
symlink.
math/CMakeLists.txt drops the EXISTS-gated math_ntt_c_abi target back to
always-built since the source self-guards. Per-backend include paths and
link_libraries stay conditional.
Verification: cevm-genesis-parity PASSES under both modes
(canonical state_root = 0x2d1ced... ; canonical genesis hash = 0x3f4fa2...).
CMake's find_package(lux-gpu-kernels) at top-level CMakeLists symlinks each scheme's gpu/<backend>/ from the lux-private install prefix at configure time. Those symlinks are runtime artifacts, never committed — ignore them so 'git status' stays clean across CPU-only and with-private builds.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a non-breaking CMake discovery hook for the proprietary kernel
repo
lux-private/gpu-kernels. Public crypto remains buildable asCPU-only when the private repo is absent; when present (via
CMAKE_PREFIX_PATH), the per-schemegpu/<backend>subdirs aresymlinked from the install root.
Validation
repo absent (
-- lux-gpu-kernels: NOT FOUND — CPU-only build).cevm-genesis-paritybuilds + runs to PASS for bothstate root and genesis hash (byte-equal to baseline).
Follow-ups (not blocking this PR)
add_library(... gpu/...)calls behind
CRYPTO_ENABLE_*so the in-treegpu/directories canbe deleted).
becomes
lux-private/gpu-kernels.Commercial license: licensing@lux.network